Reinforcement Learning with Reusing Mechanism of Avoidance Actions and its Application to Learning Whole-Body Motions of Multi-Link Robot∗

نویسنده

  • Akihiko Yamaguchi
چکیده

In acquiring a motion only from its objective by learning, large cost such as damage from falling over and a large number of trials are required if the motion is a complex one, such as a jumping serve. Reusing the knowledge already learnt is an essential mechanism to learn such motions efficiently, like humans do. In this study, we propose to use a decomposition of action-value functions as a reusing mechanism for reinforcement learning. Avoidance actions that are assumed invariant across different tasks (e.g. avoiding to fall over) are learnt separately from primary actions assumed to be task specific. Then the action-value function for the avoidance actions is reused in learning new tasks. Furthermore, we extend the method for multi-link robots to learn whole body motions. This learning method has been applied for moving tasks both in discrete and continuous planes, and has been also applied for a tennis-serve and a jump tasks of a 4-link robot. The simulation results have demonstrated that in the moving tasks, reusing avoidance actions enables the agent to avoid the bad actions effectively. In learning whole-body motions, reusing avoidance actions obtained in learning a jump reduces to half the total falling damage in learning a serve compared to learning without reusing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

Dynamic Obstacle Avoidance with PEARL: PrEference Appraisal Reinforcement Learning

Manual derivation of optimal robot motions for task completion is difficult, especially when a robot is required to balance its actions between opposing preferences. One solution has been to automatically learn near optimal motions with Reinforcement Learning (RL). This has been successful for several tasks including swing-free UAV flight, table tennis, and autonomous driving. However, high-dim...

متن کامل

Voltage Coordination of FACTS Devices in Power Systems Using RL-Based Multi-Agent Systems

This paper describes how multi-agent system technology can be used as the underpinning platform for voltage control in power systems. In this study, some FACTS (flexible AC transmission systems) devices are properly designed to coordinate their decisions and actions in order to provide a coordinated secondary voltage control mechanism based on multi-agent theory. Each device here is modeled as ...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008